Video-realistic synthetic speech with a parametric visual speech synthesizer

نویسنده

  • Sascha Fagel
چکیده

The author presents a new face module for MASSY, the Modular Audiovisual Speech SYnthesizer [1]. Within this face module the system combines two approaches of visual speech synthesis. Although the articulation space is parameterized in terms of movements of the articulators, the visual synthesis is image based (video-realistic). The high-level visual speech synthesis generates a sequence of control commands for the visible articulation. The video synthesis searches an image database for appropriate video frames. If no image with facial properties according to the control commands is found, the missing image is generated by deforming a neutral image. MPEG-4 facial definition parameters (FDPs) [2] and additional points in the mouth opening area and around the lower jaw are defined in the neutral image as feature points. A twodimensional displacement vector is defined for each feature point. For the image deformation a mesh of triangles connecting the feature points is used. The displacement vector of a point in a triangle is interpolated from the displacement vectors of the vertices. Hence, the video synthesis algorithm is capable to use either a database of appropriately annotated video frames or a single neutral image with specified feature points and displacement vectors. A simple software tool for marking the feature points in the image was developed. Other well known data (image) based audio-visual speech synthesis systems like MIKETALK [3] and VIDEO REWRITE [4] concatenate prerecorded video sequences. The presented system demonstrates the compatibility of parametric and data based visual speech synthesis approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Merging methods of speech visualization

The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two approaches of visual speech synthesis. Two control models are implemented: a (data based) di-viseme model and a (rule based) dominance model where both produce control commands in a parameterized articulation space. Analogously two visualization methods are implemented: an image based (video-realistic...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Wideband Parametric Speech Synthesis Using Warped Linear Prediction

This paper studies the use of warped linear prediction (WLP) for wideband parametric speech synthesis. As the sampling frequency is increased from the usual 16 kHz, linear frequency resolution of conventional linear prediction (LP) cannot efficiently model the speech spectrum. By using frequency warping that weights perceptually the most important formant information, spectral models with bette...

متن کامل

A text-to-audiovisual-speech synthesizer for French

An audiovisual speech synthesizer from unlimited French text is here presented. It uses a 3-D parametric model of the face. The facial model is controlled by eight parameters. Target values have been assigned to the parameters, for each French viseme, based upon measurements made on a human speaker. Parameter trajectories are modeled by means of dominance functions associated with each paramete...

متن کامل

iFACE: A 3D Synthetic Talking Face

We present the iFACE system, a visual speech synthesizer that provides a form of virtual face-to-face communication. The system provides an interactive tool for the user to customize a graphic head model for the virtual agent of a person based on his/her range data. The texture is mapped onto the customized model to achieve a realistic appearance. Face animations are produced by using text stre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004